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Abstract 

Improvement in time resolution sometimes introduces short-range random noises into temporal 
data sequences. These noises affect the results of power-spectrum analyses and the Detrended 
Fluctuation Analysis (DFA). The DFA is one of useful methods for analyzing long-range correla- 
tions in non-stationary sequences. The effects of noises are discussed based on artificial temporal 
sequences. Short-range noises prevent power-spectrum analyses from detecting long-range correla- 
tions. The DFA can extract long-range correlations from noisy time sequences. The DFA also gives 
the threshold time length, under which the noises dominate. For practical analyses, coarse-grained 
time sequences are shown to recover long-range correlations. 
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I. INTRODUCTION 



Studies of temporal data with long-range correlations recently have attracted research 
interests in various fields of physics, biology, social sciences, technologies and so on. Re- 
searchers have been trying to observe power-law fiuctuations in various temporal data. And 
the origins of those power-law fiuctuations have been one of hot research subjects. Incre- 
ment of the amount of such data enables us to understand complex systems based on data 
obtained empirically. 

Those temporal data observed in complex systems are sometimes not stationary. The 
detrended fiuctuation analysis (DFA) is one of the methods for analyzing non-stationary 



sequences for detecting long-range correlations. It was first deve 



oped for analyzing the 



m 



The method has 



long-range correlations in deoxyribonucleic acid (DNA) sequences 
been employed for observing their power-law properties in various time series with non- 
stationarity 

The first step in the DFA method is to define the profile as the accumulated deviation 
from the average of the data. The data sequence is divided into non-overlapping segments 
of equal length /. Fitting the profile by polynomials in each segment defines the local trend. 
If the local trend is obtained as a line, the DFA method is called the first order DFA. We 
employ the first order DFA in this paper for simplicity. 

Then we evaluate the standard deviation F{1) of the profile from the local trend. If the 
data sequence has power-law fiuctuations, namely the power spectrum P{k) of the data 
obeys the power-law 

P{k) ~ k-^, (1) 
the dependence of F{1) on the segment length / is given as 

F{1) ~ r, 7 = 2a - 1. (2) 

Increment of the amount of data sometimes means improvement in time resolution. How 
does the improvement in time resolution contribute to understand long-range correlations 
in data sequences? Let us consider data traffic in the Internet, for instance. Internet traffic 
had been thought to be uncorrected and be modeled by a Poisson process, because hosts 
are assumed to send data packets randomly. The validity of this assumption has clearly 
lost on the basis of various experimental measurements sl. Power-law properties of Internet 
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traffic have been investigated instead^, 0, 0]. 

In general, Internet traffic data is collected by a software tool called MRTG (Multi Router 
Traffic Grapher) [3] . It communicates with routers and switches through SNMP (Simple 
Network Management Protocol). With its default setting, the MRTG collects the amount 
of packets as 5 minutes average. By shortening the period for collecting data, various types 
of irregularity will be included: asynchronous behavior of clients and routers, external noises 
such as behavior of users, and statistical errors of the observation. The Internet traffic, in 
fact, has been reported to be random in smaller time scale than 100 ms 8|. 

The purpose of this paper is to understand how the randomness or irregularity in short 
time scales affects results of the DFA and power-spectrum analyses. For investigating the 
effects of short-range noises, in this work, artificial data with long-range correlation and 
short-range randomness are prepared. The results with the DFA and standard power spectra 
on the artificial data will be investigated. 

The organization of this paper is as follows: The Fourier Filter Method (FFM) is employed 
to generate time sequence with a power-law correlation in §2. The results of the standard 
power-spectrum and the DFA method are investigated. The short time scale random noises 
are introduced into the time sequence in §3 by changing the filter function in FFM. The 
power-spectrum analysis will be investigated to be affected strongly by the noise. The 
practical way to eliminate noises is averaging over short-time scales. The coarse grained 
sequence is investigated in §4. Section 5 is devoted to summary and discussion. 

II. FOURIER FILTER METHOD 

We generate artificial time sequences with power-law correlations for observing the effects 
of short-range noises on time sequences with long-range correlations. The Fourier Filter 

hn 

Method (FFM)[9|, llOj is one of the methods for generating such sequences. The method is 
so simple that we can introduce various types of spectra into sequences. The method was 



improved for extending the range of correlation llj . We employ the original form of the 
method for simplicity. 

An uncorrelated random sequence of length T is prepared as {u^} (t = 0, 1, . . . , T — 1) in 
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FFM. The correlation function of this sequence is given by 



T-l 



(3) 



t=0 



The Fourier components of the correlation is given as 

1 T-l 



1 



(4) 



where Uk is a Fourier component of the sequence {ut}. The sequence {ut} is prepared 
randomly. So the Fourier components of the correlation are almost flat. 

A correlation will be implemented into the sequence by changing amplitudes of the Fourier 
components {uk}- To introduce a power-law correlation, a filter is defined 



S{k) = k-\ 

The new sequence {qt} is defined with its Fourier component fjk and the filter. 



(5) 



rjk 



Uk 



(6) 



The new sequence {qt} bears power-law fluctuations 



P{k) ~ k- 



(7) 



We generate a sequence with length T = 2^° ~ 10^ in this paper. A part of the generated 
sequence by FFM with 7 = 0.9 is shown in Fig. [1] It does not look like a simple random 
sequence. There seems to be long range correlations. 
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FIG. 1: A part of the sequence generated by FFM with 7 = 0.9. 
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The power-spectrum analysis is one of the most standard methods for detecting power- 
law properties in time sequences. The power spectrum of the new sequence rjt is shown in 
Fig. [21 It shows clear power-law dependence. The least square method for fitting all data 
points gives the expected value of the power exponent 7 = 0.90. 
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FIG. 2: The power spectrum of the sequence generated by FFM with 7 = 0.9. The Une shows 
the result by the least square method for fitting all data points, which gives the expected value 
7 = 0.90. Note that the number of data points is reduced. 



Figure [3] shows the result of the DFA analysis. The observed exponent a corresponds to 
the expected value a = (7 + l)/2 = 0.95. Namely, if the power-law correlation covers the 
whole range of the data, the result of the power spectrum coincides with the result of the 
DFA. 
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FIG. 3: The result of the DFA analysis of the sequence generated by FFM with 7 = 0.9. 
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III. FOURIER FILTER METHOD WITH SHORT-RANGE NOISE 



Various types of irregularity will be included in data by improving time resolution of 
observation. To investigate the effect of short-range irregularity on long-range correlations, 
let us change the filter S{k) as follows for including short-range noises: 

S{k) = k-^ + k-\ (8) 

where kc is a constant corresponding to a threshold of the filter. The filter S{k) is almost 
constant for larger wave numbers than kc- The randomness in {uk}, namely, is not suppressed 
in larger wave numbers than kc- The fluctuation of the new sequence will obey a power-law 
in the longer range {k < kc), but is random in the shorter range {k > kc)- Figure H] shows 
the filter with threshold S{k) for kc = lO^^T. 
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FIG. 4: Fourier filter with threshold for kc = lO'^T. 

Figure [5] shows a part of the generated sequence by FFM with threshold. The amplitudes 
of high frequency random modes are larger than those in the sequence by FFM without 
threshold. So the sequence looks random at a glance, by comparing Fig. [H 

The effects of short-range randomness become obvious in the power spectrum. Figure [6] 
shows the power spectrum of the sequence generated by FFM with threshold. The spectrum 
is almost fiat. The short-range random noises dominate the spectrum and prevent us to 
detect the long-range correlation. 

The exponent is obtained as 7 ~ 0.02, if you apply the least square method for fitting all 
data points for the spectrum. The exponent obtained as 7 ~ 0.66 < 0.9 is smaller than the 
expected value, by applying the fitting for long-range data points limited for k < kc- The 
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FIG. 5: A part of the sequence generated by FFM with threshold. 



power-spectrum analysis, namely, is strongly affected by short-range noises. It seems to be 
difficult to detect long-range correlations by analyzing the power spectrum. 




k 

FIG. 6: The power spectrum of the sequence generated by FFM with threshold. The solid line 
corresponds 7 = 0.9. The broken line corresponds the least square fitting for data points for 
k < 10^. The exponent obtained by the fitting is 7 = 0.66. Note that the number of data points 
is reduced. 



The DFA analysis is more useful in this case than power-spectrum analyses. Figure [7] 
shows the result of the DFA analysis. It shows crossover of two regions. The exponent is 
a ~ 0.5 for the shorter region. It is the value for random sequences. And for the longer region 
it is a = (7 + l)/2 = 0.95, which corresponds to the expected correlation. The crossover 
point of these two regions locates at the threshold kc- The DFA analysis, namely, detects the 
existence of long-range correlation and gives the threshold above which random noises 
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dominate. 




FIG. 7: The result of the DFA analysis of the sequence generated by FFM with threshold. 



IV. COARSE-GRAINED SEQUENCE 

Real observed data, in general, will contain various types of irregularity in short-range 
area. The simplest practical way to eliminate such irregularity is to sum data over some 
short length. We examine that this simple method preserves the long-range correlation in 
the original sequence as you expect. 

The sequence generated in the previous section contains short-range random noises with 
long-range correlations. The DFA analysis gives the threshold kc, at which short-range noises 
dominate. Summing data over segments of length T/kc will eliminate those noises. Figure 
E] shows the coarse-grained data obtained by summation up to T/kc- It shows the existence 
of long-range correlations. 

Figure M shows the power spectrum of the coarse-grained sequence. Fitting all data by 
the least square method gives the exponent 7 = —0.98, which is slightly different from the 
imposed one 7 = —0.9. 

The DFA analysis for the coarse-grained sequence gives the exponent a = 0.95 as shown 
in Fig. [lOl The power-law correlation with 7 = 0.9 installed into the sequence is recovered 
and is detected by the DFA method. 
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FIG. 8: A part of the coarse-grained sequence. Note that the unit of the i-axis is 10'^ times larger 
than in Fig|T] 




FIG. 9: The power spectrum of the coarse-grained sequence. The sohd Une corresponds 7 = 0.9. 
The broken hne corresponds the least square fitting for all data points. The exponent obtained by 
the fitting is 7 = 0.98. 



V. SUMMARY AND DISCUSSION 



Power-law correlations contained in temporal data of various dynamical systems have 
attracted research interests in various research fields. Increment of the amount of data 
sometimes means improvement of temporal resolution. Improvement of temporal resolution 
of data sometimes introduces asynchronous irregularity into data. Those short-range noises 
may prevent us from analyzing long-term correlations. 

Short-range noises is shown to affect strongly the power-spectrum analysis. It is very 
difficult to detect long-range correlation, if the data contain such short-range noises. 
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FIG. 10: The result of the DFA analysis of the coarse-grained sequence. 

The detrended fluctuation analysis (DFA) can detect both the short-range noises and the 
long-range correlations. There two ranges intersect at a threshold. The DFA also gives the 
threshold dividing these two ranges. 

Finally we discuss the practicality of this work. A e-mail service is one of the most 
popular services in the Internet. Users send e-mail messages to e-mail servers of their 
own organization or those operated by Internet service providers. E-mail servers record 
e-mail sending requests usually every second. Every record contains the sender and receiver 
addresses and the message size. Namely the amount of sent messages is recorded every 
second. 
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FIG. 11: The result of the DFA for e-mail messages at a e-mail server. In the shorter range than 
one hour, random noises dominate the data. The power-law correlation can be found for longer 
range than one hour. 
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A user will send his message to another user. The receiver will respond the message by 
quoting the received message after some delay. Therefore the sequence of the amount of 
e-mail messages will contain long-range correlations. 

Figure [TT] shows the result of the DFA analysis for the amount of e-mail messages at a e- 
mail server. In the shorter range than one hour, the exponent is a ~ 0.5, which corresponds 
to one for random noises. In the longer range than one hour, the long range correlation with 
a ~ 0.95 can be found. 

The number of the data points of this observation is of order 10^. The short range to 
the order 10^ is dominated random noises. Namely the study in this paper with artificial 
sequences will be applicable to real observed data. The detail analysis of the cases for e-mail 
messages will be discussed elsewhere. 
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